摘要 :
Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This ch...
展开
Advances in multimedia data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM). Multimedia data mining can be defined as the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation for doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform domain specific tasks for data from domains such as surveillance, meetings, broadcast news, sports, archives, movies, medical data, as well as personal and online media collections. This paper presents a survey on the prob-lems and solutions in Multimedia Data Mining, approached from the following angles: feature extraction, transformation and representation techniques, data min-ing techniques, and current multimedia data mining systems in various application domains. We discuss main aspects of feature extraction, transformation and repre-sentation techniques. These aspects are: level of feature extraction, feature fusion, features synchronization, feature correlation discovery and accurate representa-tion of multimedia data. Comparison of MDM techniques with state of the art video processing, audio processing and image processing techniques is also provided. Similarly, we compare MDM techniques with the state of the art data mining tech-niques involving clustering, classification, sequence pattern mining, association rule mining and visualization. We review current multimedia data mining systems in detail, grouping them according to problem formulations and approaches. The re-view includes supervised and unsupervised discovery of events and actions from one or more continuous sequences. We also do a detailed analysis to understand what has been achieved and what are the remaining gaps where future research efforts could be focussed. We then conclude this survey with a look at open research directions.
收起
摘要 :
In an increasingly data-driven world, large volumes of fine-grained data are infiltrating all aspects of our lives. The world of education is no exception to this phenomenon: in classrooms, we are witnessing an increasing amount o...
展开
In an increasingly data-driven world, large volumes of fine-grained data are infiltrating all aspects of our lives. The world of education is no exception to this phenomenon: in classrooms, we are witnessing an increasing amount of information being collected on learners and teachers. Because educational practitioners have so much contextual and practical knowledge about classroom management, we argue that data-mining workflows should be co-designed with them. This paper describes a class on Multimodal Learning Analytics taught to graduate students in education who used to be (or are planning to become) educators, teachers, school administrators and have little to no technical background. The course was designed to provide novices with career-relevant hands-on activities and facilitate personal engagement with data collection and analysis. We provide examples of student-created data mining workflows and the trajectory they followed to get a foundational understanding of data mining. Finally, we present survey data illustrating the strengths and weaknesses of the assignments and projects used in the class. We conclude with lessons learned and recommendations for implementing such a course at other institutions.
收起
摘要 :
Currently, the internet is increasingly popular. More people are used to sharing their feelings about various things on the internet. Online product marketing information is also growing. How to mine the required information from ...
展开
Currently, the internet is increasingly popular. More people are used to sharing their feelings about various things on the internet. Online product marketing information is also growing. How to mine the required information from the massive information with the support of big data technology has become a big problem. Thereby, based on the text mining of online product marketing information, this work discusses the text preprocessing methods and the temporal convolution network (TCN) based on a convolutional neural network (CNN). Moreover, on this basis, multimodal attention mechanism (AM) and cross-modal transformer structure are added to build a TCN based on AM (AM-TCN) model to analyze the multimodal emotion of online product marketing information. The results show that the accuracy of the AM-TCN model is 2.88% higher than that of the TCN model alone, and Fl is 3.47% higher. Moreover, the accuracy rate of the AM-TCN is 1.22% higher than that of the next highest recurrent multistage fusion network, and the Fl value is 0.95% higher.
收起
摘要 :
A huge number of videos are posted every day on social media platforms such as Facebook and YouTube. This makes the Internet an unlimited source of information. In the coming decades, coping with such information and mining useful...
展开
A huge number of videos are posted every day on social media platforms such as Facebook and YouTube. This makes the Internet an unlimited source of information. In the coming decades, coping with such information and mining useful knowledge from it will be an increasingly difficult task. In this paper, we propose a novel methodology for multimodal sentiment analysis, which consists in harvesting sentiments from Web videos by demonstrating a model that uses audio, visual and textual modalities as sources of information. We used both feature- and decision-level fusion methods to merge affective information extracted from multiple modalities. A thorough comparison with existing works in this area is carried out throughout the paper, which demonstrates the novelty of our approach. Preliminary comparative experiments with the YouTube dataset show that the proposed multimodal system achieves an accuracy of nearly 80%, outperforming all state-of-the-art systems by more than 20%. (C) 2015 Elsevier B.V. All rights reserved.
收起
摘要 :
Because of the growth of the business sector dealing in the distribution of movies, software, music, and other contents, a very large amount of contents has accumulated. Accordingly, recommendation systems for inducing user reques...
展开
Because of the growth of the business sector dealing in the distribution of movies, software, music, and other contents, a very large amount of contents has accumulated. Accordingly, recommendation systems for inducing user requests for contents are more important. In distribution businesses, accurate content recommendations are required to secure and retain users. To establish a highly accurate recommendation system, the recommended contents must be accurately classified. As classification methods, mainly techniques such as naive Bayes, SGD (stochastic gradient descent), and SVM (support vector machine), are utilized. If all of the information on recommended subjects is applied in the classification process, high-level accuracy can be expected, but heavy calculation, a long service time, and low scalability are incurred. Given this inefficiency, effective classification in which the metadata of contents are used is required. Metadata are expressed in the forms of the domain concept, relation, type, and attribute to allow the complicated relations between multimodal data (text, images, and video) to be processed efficiently. Most classification systems use single modal data to express one piece of knowledge for an item in a domain. Single modal data are limited in terms of improving classification accuracy, because they do not include the useful information provided by different knowledge types. Therefore, in this paper, we propose MMCNet, a deep learning-based multimodal classification model that uses dynamic knowledge. The proposed method consists of a classification model that applies the human learning principle-based CNN (convolution neural network) to multimodal data in combination with text and image knowledge. By using a Web robot agent, multimodal data are collected from the TMDb (The Movie Database) data set, which includes a variety of single modal data. In the preprocessing procedures, knowledge integration, knowledge conversion, and knowledge reduction are performed to create a quantified knowledge base. To handle text data, sentences are refined through morphological analysis and converted to numerical vectors by using word embedding. Image data are converted to numerical vectors using the library related to vector conversion. The converted feature vectors are utilized to create multimodal learning data and the classification model is used for learning. To solve the problem of memory operation resources, vector model-based meta-knowledge is expanded through expression, conversion, alignment, inference, and deep learning. To evaluate its performance, the proposed model was compared with conventional classification methods in terms of accuracy, recall, and F1-score. According to this evaluation, the proposed classification model improves the accuracy, recall, and F1-score rates more than the conventional methods. In addition, the proposed model was implemented as a deep learning-based multimodal classification system in a graphical user interface environment that allows users to provide feedback about the classification results by adjusting classification parameters. Through the convergence of the knowledge bases of various domains and multimodal deep learning, the dynamic knowledge that influences user preference is inferred.
收起
摘要 :
Classification is a fundamental problem in machine learning and data mining. This paper applies a stochastic optimization model to classification problems. The proposed maximum entropy estimated distribution model uses a probabili...
展开
Classification is a fundamental problem in machine learning and data mining. This paper applies a stochastic optimization model to classification problems. The proposed maximum entropy estimated distribution model uses a probabilistic distribution to represent solution space, and a sampling technique to explore search space. This paper demonstrates the application of the proposed maximum entropy estimated distribution model to improve linear discriminant function and rule induction methods. In addition, this paper compares the proposed classification model with decision trees. It shows that the proposed model is preferable to decision tree C4.5 in the following cases: ⅰ) when prior distribution of classification is available; ⅱ) when no assumption is made about underlying classification structure; and ⅲ) when a classification problem is multimodal in nature.
收起
摘要 :
It is widely believed that bike-sharing has the potential to encourage sustainable travel by combining the flexibility of cycling with the reliability of public transport. However, there is actually little empirical evidence conce...
展开
It is widely believed that bike-sharing has the potential to encourage sustainable travel by combining the flexibility of cycling with the reliability of public transport. However, there is actually little empirical evidence concerning the scale of that effect. While many models of bike-sharing travel patterns include station locations, only a few have accounted for heterogeneity in service levels. This paper aims to fill this gap by examining the case of the bike-sharing system in the city of Poznan (536,000 inhabitants). We hypothesise that a higher number of bike-sharing trips could be found in places with a higher frequency of public transport. A model based on trips data mined through a web application programming interface (over 19,240,000 GPS recorded bicycle positions), and open public transport frequency data from the general transit feed specification is used. Regression results show that while including control variables and spatial effects, the frequency of public transport was significantly associated with the number of bike-sharing trips. A positive effect existed for short and medium trips, whereas no relationship was found for long trips. Findings support the view that public transport frequency is a relevant factor for bike-sharing which should be taken into account in planning.
收起
摘要 :
In recent years, sensor and information technologies have greatly boosted the wearable/portal/medical devices development. A large number of multimodal biomedical signals such as electroencephalography (EEG), electrocardiography (...
展开
In recent years, sensor and information technologies have greatly boosted the wearable/portal/medical devices development. A large number of multimodal biomedical signals such as electroencephalography (EEG), electrocardiography (ECG), electrooculogram (EOG), and electromyography (EMG), have been recorded for rehabilitation analysis, mental disorders evaluation, emotion recognition, cardiovascular disease diagnosis, etc. In these research fields, most researchers often use single-modal biomedical signals to build the corresponding analysis models. However, many clinical practice tasks, such as disease diagnosis, arrhythmias detection, and sleep condition monitoring, require multimodal biomedical signals together to make correct diagnoses, decisions, identifications, and predictions. It is noted that learning from multimodal biomedical signals can offer the possibility of capturing corresponded information and gaining an in-depth understanding of the relationship among different modalities.
收起
摘要 :
Mining knowledge from a multimedia database has received increasing attentions recently since huge repositories are made available by the development of the Internet. In this article, we exploit the relations among different modal...
展开
Mining knowledge from a multimedia database has received increasing attentions recently since huge repositories are made available by the development of the Internet. In this article, we exploit the relations among different modalities in a multimedia database and present a framework for general multimodal data mining problem where image annotation and image retrieval are considered as the special cases. Specifically, the multimodal data mining problem can be formulated as a structured prediction problem where we learn the mapping from an input to the structured and interdependent output variables. In addition, in order to reduce the demanding computation, we propose a new max margin structure learning approach called Enhanced Max Margin Learning (EMML) framework, which is much more efficient with a much faster convergence rate than the existing max margin learning methods, as verified through empirical evaluations. Furthermore, we apply EMML framework to develop an effective and efficient solution to the multimodal data mining problem that is highly scalable in the sense that the query response time is independent of the database scale. The EMML framework allows an efficient multimodal data mining query in a very large scale multimedia database, and excels many existing multimodal data mining methods in the literature that do not scale up at all. The performance comparison with a state-of-the-art multimodal data mining method is reported for the real-world image databases.
收起
摘要 :
A multimode network consists of heterogeneous types of actors with various interactions occurring between them. Identifying communities in a multimode network can help understand the structural properties of the network, address t...
展开
A multimode network consists of heterogeneous types of actors with various interactions occurring between them. Identifying communities in a multimode network can help understand the structural properties of the network, address the data shortage and unbalanced problems, and assist tasks like targeted marketing and finding influential actors within or between groups. In general, a network and its group structure often evolve unevenly. In a dynamic multimode network, both group membership and interactions can evolve, posing a challenging problem of identifying these evolving communities. In this work, we try to address this problem by employing the temporal information to analyze a multimode network. A temporally regularized framework and its convergence property are carefully studied. We show that the algorithm can be interpreted as an iterative latent semantic analysis process, which allows for extensions to handle networks with actor attributes and within-mode interactions. Experiments on both synthetic data and real-world networks demonstrate the efficacy of our approach and suggest its generality in capturing evolving groups in networks with heterogeneous entities and complex relationships.
收起